SCIENTIFIC 

REPpRTS 



n5 




SUBJECT AREAS: 

COMPUTATIONAL 
BIOLOGY AND 
BIOINFORMATICS 

BIOPHYSICS 

SYSTEMS BIOLOGY 

BIOTECHNOLOGY 



Received 
16 November 2012 

Accepted 
3 December 2012 

Published 
3 January 201 3 



Correspondence and 
requests for materials 
should be addressed to 
C.T.Z. (ctzhang@tju. 

edu.cn) 



A novel triangle mapping technique to 
study the /l-index based citation 
distribution 



Chun-Ting Zhang 



Department of Physics, Tianjin University, Tianjin 300072, China. 



The /i-index has received wide attention in recent years. The area under the citation function is divided by 
the h -index into three parts, representing h -squared, excess and /i-tail citations. The h -index by itself does 
not carry information for excess and /i-tail citations, which can play an even more dominant role than h- 
index in determining the citation curve, and therefore it is necessary to examine the relations among them. A 
triangle mapping technique is proposed here to map the three percentages of these citations onto a point 
within a regular triangle. By viewing the distribution of mapping points, shapes of the citation functions can 
be studied in a perceivable form. As an example, the distribution of the mapping points for 100 most prolific 
economists is studied by this technique. 

The /z-index, proposed by Hirsch 1 for evaluating the academic impact of individual researchers, has received 
wide attention in recent years. The citations received by all papers of a given researcher can be characterized 
by a citation distribution function, where the y-axis corresponds to the citations received by a paper, whereas 
the x-axis represents the paper rank arranged in descending order of citations (Fig. 1). The distribution of citation 
verse paper rank is called the citation distribution function or curve, denoted by C(x), in which the paper x receives 
C(x) citations. The /z-index was simply defined as C(h) = h\ The area under the citation distribution curve is 
divided by the /z-index into two parts: those of the /z-core 2 and the /z-tail 3 . The former is further divided into 
another two parts: those of excess citations 4 and /Vsquared citations 1 . As a consequence, the total citations are 
divided into three different parts: /Vsquared, excess and /z-tail citations (Fig. 1). Indeed, the /z-index lacks 
information for the excess and the /z-tail citations, keeping only the citations related to the /z-index (/z 2 ). 
Theoretically, only when h 2 is dominant among the three parts, the /z-index can properly reflect the academic 
performance of the scientist under study, otherwise, the /z-index leads to biased evaluation. The question that 
whether the /z-index dominates the citations or not depends on the shape of citation distribution function. 

As pointed out by Bornmann and co-workers 5 , for an isohindex group (scientists having the same /z-index), 
their associated citation distribution functions may display quite different shapes. Therefore, to study how to 
apply /z-index fairly, it is necessary to study the shape of the citation distribution functions, and the current study 
aims to address this question by using a triangle mapping technique. One of the advantages of citation triangle 
method is that the comparison of different shapes of the citation distribution functions can be performed 
intuitively. By viewing the distribution of mapping points within the triangle, the shapes of the citation distri- 
bution functions can be studied with a perceivable manner. Based on this method, we are able to study the degree 
with which the /z-index is applicable properly. It is hoped that the technique presented here is useful for using the 
/z-index to evaluate academic performance in a more unbiased way. 

Results 

We here propose a novel triangle mapping technique to study the relations among /z-squared, excess and /z-tail 
citations. For a regular triangle, the sum of the distances from any interior point to the three sides is equal to a 
constant, the height of the triangle. Note that the sum of the percentages for /z 2 , e 2 and t 2 is also a constant, which 
equals to 1. Based on this characteristic, percentages for these 3 kinds of citations are mapped onto a point in a 
regular triangle (Fig. 2A). Refer to the Method section for details. 

First of all, let us consider two concrete examples. According to the citation information provided by Dodson 6 , 
Qotal = 1700, h 2 = 625 (h = 25), e 2 = 477 (e = 21 .84), and we find H = 0.37, E = 0.28 and T = 0.35. Therefore, the 
mapping point corresponding to Dodson is situated at the region No. 4, where the /z-index is applicable (Fig. 2B). 
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Figure 1 | The citation distribution curve. The y-axis corresponds to the 
citations received by a paper, whereas the x-axis represents the paper rank 
arranged in descending order of citations. The area under the citation 
distribution curve is divided by the /z-index into three parts: h 2 , e 2 (excess) 
and f 2 (/z-tail). 

The second example is for the chemist Berni Alder, where h 2 = 2500 
(fi = 50), e 2 = 12996 (e=114) and Q ota/ = 18400 4 , and we find 
H = 0.14, £ = 0.71 and T = 0.15. Therefore, the mapping point cor- 
responding to Alder is situated at the region No. 5, where the e-index 
is absolutely dominant (Fig. 2B). This example shows that Alder's h- 
index severely under- estimates his academic impact, and in this case, 
the e-index should be used together with the /z-index for a fair evalu- 
ation 4 . 

In what follows, let us apply the citation triangle method to study 
the cases of citations of the 100 most prolific economists 7 . The data 
used to derive the corresponding /z-index, e-index and f-index were 
kindly provided by Dr. Tol. As a consequence, we calculated the 
coordinates x and y of each mapping point corresponding to each 
economist. The distribution of the 100 points is showed in Fig. 3 A. As 
we can see that only two points are situated at the region No. 3, i.e., an 
/z-index dominant region. Meanwhile, only 11 points (11%) are situ- 
ated above the horizontal line H = 1/3 or y = 0, where the /z-index 
can be properly applicable. Accordingly, for the remaining cases 
(89%), where H< 1/3 or y<0, the /z-index should be used jointly 
with the e-index, even the /-index. The average /z-index and e-index 
over the 100 points are 19 and 28.14, respectively, corresponding to 
the average H and E being 0.26 and 0.48 (average x = — 0.13, y = 
— 0.07). Overall, to have a fair and accurate evaluation, the /z-index 
should be used together with the e-index even the Mndex for most of 
the 100 most prolific economists. 

The /z-index captures only the information of the citation function 
partially. However, the above distribution of the 100 mapping points 
within the triangle provides more information about the shapes of the 
corresponding citation functions. For example, the mapping points 
within the small triangle No. 5 indicate that their citation distribution 
functions are peaked on the beginning part. On the contrary, the 
mapping points within the small triangle No. 9 indicate that their 
citation distribution functions are flat with a long tail. In both cases, 
the /z-index seems not appropriate in capturing the main information 
of citation function. To complement the /z-index, Bormann and co- 
workers 5 introduced three parameters: the h 2 upper, h 2 center and 
/z 2 lower, which correspond to E, H and T, respectively, in this paper. 
In other words, the triangle mapping technique provides an intuitive 
representation of the h 2 upper, h 2 center and h 2 lower. Bornmann and 
co-workers 5 studied the shapes of the citation distribution functions of 
three scientists, A, B and C, belonging to an isohindex group with h = 
14. For scientist A, E = 0.82, H = 0.15 and T = 0.03, corresponding 



to x = — 0.456, y= — 0. 183. Its mapping point is situated at the small 
triangle No. 5, an e-index absolutely dominated regions. According to 
Bornmann et al 5 and Cole and Cole 8 , Scientist A is called perfectionist- 
type scientist, who has rather few but very highly cited publications. 
For scientist B, E = 0.39, H = 0.48 and T = 0.13, corresponding to 
x= —0.150, y = 0.147. Its mapping point is situated at the small 
triangle No. 2, a boundary region between /z-index and e-index domi- 
nated regions. According to references 5,8 , Scientist B is called a pro- 
lific-type scientist, who publishes a large number of high-impact 
papers. For scientist C, E = 0.10, H = 0.33 and T = 0.57, corres- 
ponding to x = 0.271, y = — 0.003. Its mapping point is situated at the 
small triangle No. 8, a Mndex dominated region. Scientist C is called a 
mass producer 5,8 , who publishes a larger number of papers that are 
lowly cited. It can be seen from the above analysis, the locations of the 
mapping points carry the information of the types of scientists. 
Therefore, the triangle mapping technique is particularly useful when 
the academic impact of a large number of scientists is studied. In that 
case, clustering analysis can be performed based on the mapping point 
locations, and therefore scientists can be classified according to their 
academic performance. 

Recently, Baum introduced a new parameter, called Excess-Tail 
Ratio 9 , denoted byR, where R = E/T= e 2 /t 2 . Baum found that for 
most cases he studied, R < 1 , even R « 1 .Only for few cases, R > 1. 
The shapes of citation distribution functions for R > 1 are peaked, 
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Figure 2 | The citation triangle method in studying A-index based 
citations. (A) A regular triangle ABC with its height being equal to 1 and 
center situated at O. A Descartes coordinate system x—y is set up with its 
origin at O. The three sides of the triangle are denoted by h, e and t, and the 
distances of an interior point P(x,y) to them are equal to H, E and T, 
respectively. Therefore, the point P(x,y) is the mapping point for the three 
real numbers H, E and T. (B) The regular triangle is divided into nine 
smaller regular triangles. The intervals of H, E and T for each of the 9 
smaller triangles are shown in Table 1. 
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Figure 3 | Distributions of mapping points in the citation triangle. (A) The distribution of the 100 mapping points for each of the 100 most prolific 
economists. Note that only 11 points (11%) are situated at the regions where the /z-index can be applicable (H> 1/3), indicating that the /z-index 
should be used jointly with the e -index, even the f-index, for the remaining 89 economists. (B) An example to demonstrate that the power parameter 
X is one of the key factors, which determines the position of the mapping point. Given C\ = 512 and N = 100, starting from the region No. 3 (the /z-index 
dominated region) with 2 = 0.4, the mapping point moves to the region No. 6 (the e-index dominated region) with 2= 1.6. Interestingly, the track 
of the mapping points forms a clockwise rotating curve. 



whereas for R < 1 the shapes of the citation functions are flat with a 
long tail. Therefore, the Excess-Tail ratio is an appropriate parameter 
to capture the overall shapes of the citation functions. According to 
eq. (12), R > lor R < 1 corresponds to x < 0, or x > 0, respectively. 

Discussion 

In what follows, we want to explore the key factors that determine the 
shape of the citation distribution function. As previously, we assume 



a simple mathematical model for the citation distribution curve C(x) 4 

Q 

C(x) = -j, Ci = C(l)>0 9 x>l, 1>0. (1) 



The total citations received by AT papers, C tota i, is 

N 

Ctotai = ^C(x)dx= , 



(N 1 - 



(2) 
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Table 1 


Intervals of H, E 


and T for each of the 


9 regions (small triangles] 


within the citation triangle 


No. 


H 


E 


T 


Feature remark 
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ri > Z/ J 


17 1 HP ^ 1 

il+ 1 < 1/ J 


h-\- 1 < 1/ D 


1 he n-index absolutely dominated region 


o 
z 


H> 1/3 


£> 1/3 


1 < 1/3 


Boundary between /Hndex and e-index dominated regions 


o 

O 


ri > 1/ j 


ii < 1 / j 


1 < I/O 


The /l-index dominated region 


4 


H>l/3 


E<l/3 


r>i/3 


Boundary between /l-indexand Hndex dominated regions 
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h+t<i/3 


E>2/3 


h+t<i/3 


The e-index absolutely dominated region 
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E>l/3 


r<i/3 


The e-index dominated region 
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Based on eq. (1), it was shown that 410 
However, we should have h<N, which leads to 



X>X 0 = -1. 



IniV 



Meanwhile, we have 4 



1 



X-l 



C\ — XC^ 



Using eqs. (l)-(5), we find 
h 2 (1- 



H = 



■i)xc; 



i-/i 
i+2 



^total 



N 1 - 



X > Xq 1 X^\. 



X 



XxC\ +x -l 



N 1 - 



-1 



, X>X 0 , 



(3) 
(4) 

(5) 

(6) 
(7) 



Therefore, the condition under which the h -index can be dominant 
should satisfy H > 1 /3, or 



(i-i)xc; 



1 + 2 



N 1 - 



-1 



> 



1 



(8) 



To have an intuitive picture, we consider some numerical 
examples as follows. Taking Ci=512, N = 100 and letting 
2 = 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, respect- 
ively, we calculate the values of H and E for each case. Using eq. 
(12), we find 12 mapping points in the triangle, as shown in Fig. 3B. It 
is interesting to see that with the increase of the X value, the track of 
the mapping points forms a clockwise rotating curve. This example 
shows that the power parameter X is one of the key factors to deter- 
mine the shape of the citation function. Given C\ and N, there is a 
threshold of X, when X is less than this threshold, the /z-index can no 
longer be properly applicable. In fact, A->oo, h-*l. 

The main contribution of this paper is to propose the citation 
triangle method, by which the shapes of citation distribution func- 
tions can be studied in a perceivable form. Based on the distribution 
of mapping points, applicability and limitation of the /z-index can be 
studied. Generally, the /z-index is not properly applicable in the 
e-index or Mndex dominated regions. In those cases, the /z-index 
should be jointly applied together with the e-index or f -index. The 
proposed mapping technique provides a platform to study the aca- 
demic impact of a group of scientists, because some mathematical 
methods, such as clustering analysis, can be used to study the distri- 
bution of mapping points, and the academic impact of these scien- 
tists can then be classified and compared. 

Methods 

The /z-index was proposed by Hirsch in 2005 1 . The set of h papers of a scientist was 
called the /z-core 2 , in which at least h citations were received by each of the h papers. 
The e-index was proposed by Zhang 4 , which was defined as the square root of excess 
citations over those used for calculating the /z-index. Therefore, the total citations 



received by the papers in the /z-core are equal to h 2 + e 2 . The /z-index divides the total 
citations of a scientist into two parts: the first part is of the /z-core, whereas the second 
one is of the /z-tail 3 . For convenience, we define the square root of citations received by 
all papers in the /z-tail as the t- index. Therefore, the number of total citations received 
by all papers of a scientist, C to tah is composed of three parts: h 2 , e 2 and t 2 , i.e., 



C total = h 2 + e 2 + t 2 , 
where h, e and t are the h-, e- and f -index, respectively. Letting 

H = h 2 /C to tal, E = e 2 j C to taU T=t 2 /Ctotah 



we have 



H + £ + r=l. 



(9) 



(10) 



(11) 



For any regular triangle, the sum of the distances from any interior point to the three 
sides is equal to the height of the triangle. Consider a regular triangle ABC with its 
height equal to 1 (Fig. 2A). Let the center of the triangle be denoted by O, and an x — y 
coordinate system is set up as shown in Fig. 2 A. Based on eq. (1 1) and the feature of 
the regular triangle, the set of three real numbers H, E and T is mapped onto a point 
P(x, y) within the triangle, as shown in Fig. 2A. Simple calculation shows that 



x = (T-E)/V3 = (l-H-2E)/V3, 
y = H-\/3. 



(12) 



The triangle can be divided into 9 smaller triangles (regions) as shown in Fig. 2B. We 
denote them by No. 1 through No. 9, respectively. Each region is characterized by a 
special interval of the three real numbers H, E and T, respectively. For example, at the 
region No. 1, H>2/3 and E + T < 1/3, indicating that h 2 is absolutely dominant at 
this region as compared with e 2 and t 2 . Similarly, at the region No. 5, E > 2/3 and 
H + T < 1 / 3, indicating that e 2 is absolutely dominant as compared with h 2 and t 2 . At 
the region No. 9, T>2/3 and H + E < 1 /3, indicating that t 2 is absolutely dominant 
as compared with h 2 and e 2 . Furthermore, at the region No.3,H>l/3,£<l/3, 
T < 1 /3, so, it is called an /z-index dominant region; at the region No. 6, E > 1 /3, 
H < 1 /3, T < 1 /3, so, it is called an e-index dominant region; and at the region No. 8, 
T > 1 /3, H < 1/3, E < 1/3, so, it is called a f-index dominant region. Finally, the 
region No. 2 is the boundary region between the /z-index and e-index dominant 
regions, the region No. 4 is the boundary region between the /z-index and Mndex 
dominant regions, and the region No. 7 is the boundary region between the e-index 
and f-index dominant regions. The above description has symmetry of a regular 
triangle. The total description is summarized in Table 1. 

The three real numbers H, E and T are the percentages of citations associated with 
the h-, e- and t- index, respectively. In general, H should be greater than 1/3 (ory > 0), 
where the /z-index is properly applicable, otherwise, if H < 1/3 (or y < 0), the /z-index 
under-evaluates the academic impact of the researcher concerned. Therefore, the four 
regions No.l, No.2, No. 3 and No.4 are the regions where the /z-index can be properly 
applied (H > 1 /3). The regions No.2, No. 5, No.6 and No.7 are the regions where the e- 
index can be properly applied (E > 1/ 3), whereas those of No.4, No.7, No. 8 and No. 9 
are the regions where the t- index can be properly applied (T > 1/3). In summary, the 
/z-index can only be properly applied in the regions No.l, No.2, No.3 and No.4 
(H > 1/3 or y > 0); and the /z-index should be jointly applied together with the e-index 
or f-index in the remaining regions No. 5 through No.9 (H < 1/3 or y < 0). 
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